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ATUM: a new technique for capturing address traces using microcode 
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annual international symposium on Computer architecture '. 

14 Issue 2 
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Trace-driven simulation is often used in the design of computer systems, 
and translation lookaside buffers. Capturing address traces to drive such 
been problematic, often involving 1000:1 software overhead to trace a ta 
and/or mechanisms that cause significant distortions in the recorded data 
for capturing address traces has been developed to use a processor's mici 
addresses in a reserved part of main memory as ... 

Techniques for compressing program address traces 
Andrew R. Pleszkun 

November 1994 Proceedings of the 27th annual international symposiu 

Microarchitecture 
Publisher: ACM Press 



http://portal.acm.org/results.cfm?CFID=961661&CFTOKEN=2765... 7/1 1/06 



Results (page 1): +address +trace 



Page 2 of 9 



Full text available: ^ pdf(93 1 .63 Additional Information: full citation , abst 

KB) citings , index ten 

In this paper a technique for generating consistent, reproducible traces w 
of magnitude better compression than standard general-purpose compres 
described. With this approach, the trace is read once, an intermediate for 
then read as the input to the second pass over the address stream. No pro 
required, and this technique will work on address streams that include O 
of the way the address trace is encod ... 

Keywords: compression, trace generation 



3 Address trace compression through loop detection and reduction 
^ E. N. Elnozahy 

May 1999 ACM SIGMETRICS Performance Evaluation Review , Pro< 
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4 Generation and analysis of very long address traces 
^ Anita Borg, R. E. Kessler, David W. Wall 

May 1990 ACM SIGARCH Computer Architecture News , Proceeding 
annual international symposium on Computer Architecture 
18 Issue 3a 
Publisher: ACM Press 

Full text available: " ^pdffl.OS Additional Information: full citation , abst 
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Existing methods of generating and analyzing traces suffer from a variet 
mcluding complexity, inaccuracy, short length, inflexibility, or applicabi 
machines. We use a trace generation mechanism based on link-time cod< 
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which is simple to use, generates accurate long traces of multi-user progi 
RISC machine, and can be flexibly controlled. On-the-fly analysis of the 
get accurate performance data for large second-1 ... 

5 RATCHET: real-time address trace compression hardware for extended tra 
^ Colleen D. Schieber, Eric E. Johnson 

April 1994 ACM SIGMETRICS Performance Evaluation Review, Vol 
Publisher: ACM Press 

Full text available: ^ pdf(783.24 Additional Information: full citation , abst 
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The address traces used in computer architecture research are commonly 
software techniques that introduce time dilations of an order of magnituc 
techniques may also omit classes of memory references that are importai 
models of computer systems, such as instruction prefetches, operating sy 
and interrupt activity.We describe a technique for capturing all classes o: 
time. RATCHET employs trace filtering hardware to redu ... 

Constructing instruction traces fi'om cache-filtered address traces (CITCA1 
Charlton D. Rose, J. Kelly Flanagan 

December 1996 ACM SIGARCH Computer Architecture News, Volun 
Publislier: ACM Press 
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Instruction traces are useful tools for studying many aspects of computer 
are difficult to gather without perturbing the systems being traced. In the 
have collected instruction traces through various techniques, including si 
instruction inlining, hardware monitoring, and processor simulation. The 
however, fail to produce accurate traces because they interfere with the f 
execution.Because processors are deterministic ... 

TRAPEDS: producing traces for multicomputers via execution driven simi 
C. B. Stunkel, W. K. Fuchs 

April 1989 ACM SIGMETRICS Performance Evaluation Review , Pro 
1989 ACM SIGMETRICS international conference on Mea 
modeling of computer systems SIGMETRICS '89, Volume 
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Trace-driven simulation is an important aid in performance analysis of c- 
Capturing address traces for these simulations is a difficult problem for s 
and particularly for multicomputers. Even when existing trace methods c 
multicomputers, the amount of collected data typically grows with the m 
so I/O and trace storage costs increase. A new technique is presented in t 
modifies the executable code to dynamically col ... 

8 Session 9: traffic analysis: Observed structure of addresses in IP traffic 
^ Eddie Kohler, Jinyang Li, Vem Paxson, Scott Shenker 

November 2002 Proceedings of the 2iid ACM SIGCOMM Workshop o 
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Publisher: ACM Press 

Full text available: " ^pdfg.lS Additional Information: full citation , abst 
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This paper mvestigates the structure of addresses contained in IP traffic, 
analyze the structural characteristics of destination IP addresses seen on 
considered as a subset of the address space. These characteristics may hz 
algorithms that deal with IP address aggregates, such as routing lookups 
congestion control. We find that address structures are well modeled by . 
Cantor dust with two parameters. The model m ... 
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^ Richard A. Uhlig, Trevor N. Mudge 
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Publisher: ACM Press 
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As the gap between processor and memory speeds continues to widen, n 
evaluating memory system designs before they are implemented in hard^ 
increasingly important. One such method, trace-driven memory simulati< 
subject of intense interest among researchers and has, as a result, enjoye( 
and substantial improvements during the past decade. This article survey 
developments by establishing criteria for evaluating trac ... 
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10 Execution-driven simulation of multiprocessors: address and timing analys 
^ S. Dwarkadas, J. R. Jump, J. B. Sinclair 
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Volume 4 Issue 4 
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This article describes and evaluates an efficient execution-driven techniq 
simulation of multiprocessors that includes the simulation of system mei 
driven by real program work loads. The technique produces correctly int 
traces at run-time without disk access overhead or hardware support, alk 
simulation of the effects of a variety of architectural altematives on prog 
implemented a simulator based on this technique that offe ... 

Keywords: distributed systems, execution-driven simulation, parallel tra 
memory multiprocessors 



11 Designing a trace format for heap allocation events 
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October 2000 ACM SIGPLAN Notices , Proceedings of the 2nd interna 
on Memory management ISMM '00, Volume 36 Issue 1 

Publisher: ACM Press 

Full text available: " ^pdfri.SS Additional Information: full citation , abst 
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Dynamic storage allocation continues to play an important role in the pei 
correctness of systems ranging from user productivity software to high-p 
While algorithms for dynamic storage allocation have been studied for d 
literature is based on measxuing the performance of benchmark program: 
of many important allocation-intensive workloads. Furthermore, to date : 
emerged or been proposed for publishing and exchangin ... 
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In the design of SPUR, a high-performance multiprocessor workstation, 
caches and hardware-supported cache consistency suggests a new approi 
address translation. By performing translation in each processor's virtual 
need for separate translation lookaside buffers (TLBs) is eliminated. Elir 
substantially reduces the hardware cost and complexity of the translation 
eliminates the translation consistency problem. Trac ... 

Optimal tracing and incremental reexecution for debu g ging long-running p 
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16 Memory-wall: Boosting trace cache performance with nonhead miss specu 
^ Stevan Vlaovic, Edward S. Davidson 

June 2002 Proceedings of the 16th international conference on Superco 

Publisher: ACM Press 

Full text available: ^ pdfn 79.52 Additional Information: full citation , abst 
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Trace caches are used to help dynamic branch prediction make multiple 
cycle by embedding some of the predictions in the trace. In this work, wi 
cache that is capable of delivering a trace consisting of a variable numbe 
a linked hst mechanism. We evaluate several schemes in the context of a 
model that stores decoded instructions. By developing a new classificati( 
accesses, we are able to target those misses t ... 

Keywords: branch prediction, optimization, trace cache, x86 
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Eric Rotenberg, Steve Bennett, James E. Smith 
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As the issue width of superscalar processors is increased, instruction fete 
requirements will also increase. It will become necessary to fetch multip 
cycle. Conventional instruction caches hinder this effort because long in: 
are not always in contiguous cache locations. We propose supplementing 
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instruction cache with a trace cache. This structure caches traces of the d 
stream, so instructions that are otherwise no ... 

Keywords: instruction cache, instruction fetching, multiple branch predi 
processors, trace cache 



18 Address compression through base register caching 
Arvin Park, Matthew Farrens 

November 1990 Proceedings of the 23rd annual workshop and symposi 

Microprogramming and microarchitecture 
Publisher: IEEE Computer Society Press 

Full text available: ^ pdf(689.60 Additional Information: full citation , abst 
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This paper presents a technique to reduce processor-to-memory address 1 
exploiting temporal and spatial locality in address reference streams. Hi^ 
of address words are cached in base registers at both the processor and n 
it possible to transmit small register indexes between processor and men 
high order address bits themselves. Trace driven simulations indicate tha 
Caching reduces processor-to-me ... 

Keywords: CPU performance, bandwidth, locality, microprocessor syst( 



19 Mache: no-loss trace compaction 
^ A. D. Samples 
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Publisher: ACM Press 
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Execution traces can be significantly compressed using their referencing 
observation leads to a technique capable of compressing execution trace; 
magnitude; instruction-only traces are compressed by two orders of mag 
technique is unlike previously reported trace compression techniques in i 
without loss of information and, therefore, does not affect trace-driven si 
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20 Techniques for efficient inline tracing on a shared-memory multiprocessor 
^ S. J. Eggers, David R. Keppel, Eric J. Koldinger, Henry M. Levy 

April 1990 ACM SIGMETRICS Performance Evaluation Review , Pro 
1990 ACM SIGMETRICS conference on Measurement and 
computer systems SIGMETRICS '90, Volume 18 Issue 1 
Publisher: ACM Press 

Full text available: ^pdfn.l2 Additional Information: full citation , abst 
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While much current research concerns multiprocessor design, few traces 
programs are available for analyzing the effect of design trade-offs. Exis 
methods have serious drawbacks: trap-driven methods often slow down ] 
by more than 1000 times, significantly perturbing program behavior; mi( 
modification is faster, but the technique is neither general nor portable. 1 
a new tool, called MPTRACE, for collecting tr ... 
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1 Optimally profiling and tracing programs 
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July 1994 ACM Transactions on Programming Languages and System 
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Full text available: ^ pdf(2.84 Additional Information: full citation , abst 
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This paper describes algorithms for inserting monitoring code to profile ; 
These algorithms greatly reduce the cost of measuring programs with res 
commonly used technique of placing code in each basic block. Program 
number of times each basic block in a program executes. Instruction trac 
sequence of basic blocks traversed in a program execution. The algoritht 
placement of counting/tracing code with respect to the ... 
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We describe a new approach to performance debugging that focuses on i 
identifying computation transformations to reduce synchronization and c 
grouping writes together into equivalence classes, we are able to tractabl 
information from long-running programs. Our performance debugger an. 
information and suggests computation transformations in terms of the so 
present the transformations suggested by the debugger on a suite of four 
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Contemporary paged virtual memory systems often use associative regis 
time to frequently-referenced pages. Here we examine the analogous use 
registers in descriptorbased, symbolically-segmented virtual memory sys 
segment contains an entire data structure as defined in a high-level langu 
data from production Algol 60 programs were used to determine perfom 
as a function of the number of associative registers in ... 

7 Active memory: a new abstraction for memory-system simulation 
^ Alvin R. Lebeck, David A. Wood 

May 1995 ACM SIGMETRICS Performance Evaluation Review , Pro< 
1995 ACM SIGMETRICS joint international conference on 
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Volume 23 Issue 1 
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This paper describes the active memory abstraction for memory-system j 
abstraction — designed specifically for on-the-fly simulation, memory rel 
invoke a user-specified function depending upon the reference's type anc 
block state. Active memory allows simulator writers to specify the appro 
each reference, including "no action" for the common case of cache hits. 
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8 Experience with a software-defined machine architecture 
^ David W. Wall 

May 1992 ACM Transactions on Programming Languages and System 
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We have built a system in which the compiler back end and the linker w« 
present an abstract machine at a considerably higher level than the actua 
intermediate language translated by the back end is the target language o 
compilers and is also the only assembly language generally available. Tl: 
intermodule register allocation, which would be harder if some of the cO" 
had come firom a traditional assembler, out of sight of ... 

Keywords: RISC, graph coloring, intermediate language, interprocedura 
pipeline scheduling, profiling, register allocation, register windows 
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EEL (Executable Editing Library) is a library for building tools to analy; 
executable (compiled) program. The systems and languages communitie 
tools for error detection, fault isolation, architecture translation, perform; 
simulation, and optimization using this approach of modifying executabl 
however, tools of this sort are difficult and time-consuming to write and 
tied to a particular machine and operating sy ... 
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Instruction traces are useful tools for studying many aspects of computet 
are difficult to gather without perturbing the systems being traced. In the 
have collected instruction traces through various techniques, including si 
instruction inlining, hardware monitoring, and processor simulation. The 
however, fail to produce accurate traces because they interfere with the i 
execution.Because processors are deterministic ... 
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While much current research concerns multiprocessor design, few traces 
programs are available for analyzing the effect of design trade-offs. Exis 
methods have serious drawbacks: trap-driven methods often slow down ] 
by more than 1000 times, significantly perturbing program behavior; mi( 
modification is faster, but the technique is neither general nor portable. 1 
a new tool, called MPTRACE, for collecting tr ... 
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Automation is the key to the design of future embedded systems as it pei 
specific customization while keeping design costs low. A key problem fi 
design systems is evaluating the performance of the vast number of alter 
timely manner. For this paper, we focus on an embedded system consisti 
components: a VLIW processor, instruction cache, data cache, and secor 
cache. A hierarchical approach of partitioning the ... 
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The CPU cycle time of a high-performance processor is usually determii 
time of the primary cache. As processors speeds increase, designers will 
number of pipeline stages used to fetch data from the cache in order to n 
dependence of CPU cycle time on cache access time. This paper studies 
advantages of a pipelined cache for a GaAs implementation of the MIPS 
using a design methodology that includes long traces of ... 
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We present an Incremental State Saving technique for which the state sa- 
inserted automatically by directly editing the application executable. Thi 
advantage of being easy to use since it is fully automatic, and has good p 
adds overhead only where state is being modified. Since the editing hap{ 
code, the method is independent of the compiler, and allows third party 1 
None of the previous incremental state saving ... 
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In this paper we present SIGMA (Simulation Infrastructure to Guide Me 
new data collection framework and family of cache analysis tools. The S 
provides detailed cache information by gathering memory reference data 
based instrumentation. This infrastructure can facilitate quick probing in 
influence the performance of an application by highlighting bottleneck si 
excessive cache/TLB misses and inefficient data layou ... 
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